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ABSTRACT 



Item mappings are widely used in educational assessment for 
applications such as test administration (through test form assembly and 
computer assisted testing) and for criterion-referenced (CR) interpretation 
of test scores or scale anchoring. Item mappings are also used to construct 
ordered item booklets in the CTB/McGraw Hill Bookmark standard setting 
procedure. Selection rules for mapping the items vary with the purpose of the 
mapping. The objective of this paper is to categorize various types of item 
mappings, to describe ways to assess the consequences of a given item 
selection rule for mapping a binary item, and to provide a general empirical 
Bayes framework from which • specif ic selection rules can be derived. A 
comparison is made on the maximum information (MI) rules and those derived 
from an empirical Bayes (EB) approach. It is noted that the EB rules coincide 
with the MI rules if the correction for guessing formula is used to extend 
the EB rules for Rasch and two parameter logistic items to the EB rules for 
three parameter logistic items. (Contains 13 references.) (Author/SLD) 



Reproductions supplied by EDRS are the best that can be made 
from the original document. 



TM031300 ED 442 874 



On Item Mappings and Statistical Rules for Selecting Binary Items 
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Item mappings are widely used in educational assessment for applications 
such as test administration (through test form assembly and computer assisted 
testing, CAT) and for criterion-referenced (CR) interpretation of test scores or 
scale anchoring. Item mappings are also used to construct ordered item booklets 
in the CTB/McGrawHill Bookmark standard setting procedure. Selection rules 
for mapping the items vary with the purpose of the mapping. The objective of 
this paper is to categorize various types of item mappings, to describe ways to 
assess the consequences of a given item selection rule for mapping a binary item, 
and to provide a general empirical Bayes framework from which specific selection 
rules can be derived. A comparison is made on the maximum infoimation (MI) 
rules and those derived from an empirical Bayes ( EB ) approach. It is noted that 
the EB rules coincide with the MI rules if the correction for guessing formula 
is used to extend the EB rules for Rasch and 2PL items to the EB rules 3PL 
items. 



Introduction 

Locating an item on an achievement continuum ( item mapping) is a well- 
entrenched process in educational assessment. Applications of item mapping 
may be found in criterion-referenced (CR) testing (or scale anchoring, Beaton 
&; Allen, 1992; Huynh, 1994, 1998), computer-assisted testing, test form as- 
sembly, and in selecting items for ordered test booklets used in the Bookmark 
standard setting process (Lewis, Mitzel, & Green, 1996). While item response 
theory (IRT) models such as the Rasch and two-parameter logistic (2PL) mod- 
els traditionally place a binary item at its location, it has been argued (Huynh, 
1998) that such mapping may not be appropriate in selecting items for CR 
interpretation and scale anchoring. 
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The purpose of this paper is to describe the three types of item map- 
pings that are often used in applications. Attention will then be focused on the 
selection of binary items for CR interpretation (or scale anchoring) and for the 
construction of ordered test booklets in the Bookmark standard setting process. 
A statistical framework will be provided for assessing the consequences of each 
selection procedure. Within an empirical Bayes context, specific selection rules 
will be formulated and compared with the maximum information (MI) rules 
derived by Huynh (1994, 1998). 

Three Types of Item Mapping 



There are different ways to map an item on an achievement continuum and 
the statistical rule for mapping depends on the specific purpose of the item 
mapping. A extensive discussion of the selection rules may be found in Huynh 
(1998). Although item mappings can be categorized in many different ways, 
there are three major types that are being used in many applications. These 
three types differ in terms of context, application, statistical and psychometric 
framework, and implications. They are described as follows. 

Type 1: Item Mapping for Ability Estimation 



In the context of ability estimation, items are typically chosen to match the 
ability of the examinee and an ability estimate is obtained from the selected 
items. It is here that the Fisher information plays a major role. Items that 
minimize the (asymptotic) standard error of the estimated ability are those 
that maximize the Fisher information. Details about item mapping for this 
situation may be found in traditional textbooks on IRT such the one by Ham- 
bleton and Swaminathan (1985). Item selection in this case aims only at an 
accurate estimation of the examinee’s ability and skip over all questions about 
what the examinee can be expected to accomplish at this level of performance. 
In general, let a, b and c be the item parameters of a three-parameter logistic 
(3PL) item. The probability of answering the item correctly is given as 



P(0) = c+ (1-c) 



exp[£>a(0 — b)] 

1 4 exp[£>a(0 — b)] 



where D = 1.7 (the constant that brings the logistic function close to a normal 
ogive). Then the item information is maximized at the ability 

0max = b -f |log[l 4* (1 4- 8c) 1/ ^ 2 ]/2| . 



In this formula, the symbol “log” represents the natural logarithmic function . 
It may be noted that 0 max = b when c = 0. Hence, for the purpose of ability 
estimation, Rasch and 2PL items are mapped at their item locations. It may be 
noted that at 0 max , the probabilities of answering a Rasch or 2PL item either 
correctly or incorrectly are both equal to 50%. 
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Type 2: Item Mapping for CR Interpretation and 
Constructing Ordered Booklets for Bookmark Standard Settings 

This process does not deal with ability estimation; rather it is concerned with 
ways to locate items at various points on a scale so that an expectation can be 
reasonably attached to that point in terms of what a subject can do at that point 
(Beaton Sz Allen, 1992). This approach follows closely the writings of Glaser 
(1963) and Glaser and Nitko (1971) on criterion-referenced (CR) measures and 
their applications. It assumes the existence of an item pool and has the major 
function of providing a CR interpretation for selected points (anchor points) on 
the scale. The creation of an ordered test form for the CTB Bookmark standard 
setting process (Lewis, Mitzel, Sz Green, 1996) also falls under this situation of 
item mapping. 

It has been stated (Huynh, 1994, 1998) that in the context of CR interpre- 
tation and scale anchoring, it may not be appropriate to map the item at the 
place where the (total) item information is maximized. Let P(6 ) be the proba- 
bility of answering the item correctly and Q(0) = 1 — P(6) be the probability of 
answering the item incorrectly. Then a general form of the Fisher information 
for a binary item is 

m = [- log ® —] 2 Q(°) + p^ ?p(o). 

In this formula, the operator 6 represents the partial derivative. 

It may be noted from the above equation the item information takes into 
account both probabilities P(0) and Q(0). Thus, the place where the item in- 
formation 1(0) is maximized (the item location) reflects a description for the 
entire item with both its correct and incorrect responses. Therefore, the item 
location concept does not embrace any expectation regarding examinee’s per- 
formance on the item. Huynh (1998, page 36) argues that, in a number of 
situations, it may be more informative to focus on the location of each separate 
response. The location of the correct response, for example, might serve as a 
signal that an examinee located at this place would be “expected” to have the 
skills underlying the item. This type of item response interpretation appears to 
be more assertive than a neutral statement that an item is located at a given 
place. It might therefore be more useful in situations such as attaching CR 
interpretations to test scores, scale anchoring, and constructing ordered test 
booklets for Bookmark standard settings. These applications often requires the 
knowledge of what an examinee can be expected to perform at a given point on 
the achievement continuum. 

Using the Bock partition of item information (Bock, 1972) as a starting 
point, Huynh (1994, 1998) developed a statistical process to select items for CR 
and scale anchoring. This procedure results in placing a 3PL item at the ability 
where the probability of answering the item correctly is 

Pi = (2 + c)/3. 

For a Rasch or 2PL item, this probability is 2/3 or about 67%. 
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Type 3: Mapping Exemplary Items for Public Release 

Here a number of anchor points (such as the Basic, Proficient, and Advanced 
achievement levels of National Assessment of Educational Progress, NAEP) have 
already been defined, each with its own subpool of items. The purpose of 
“item mapping” in this case is to select a number of items that would illustrate 
the (pre-defined) meaning of that point. The selection of exemplary items to 
illustrate the three NAEP achievement levels (Bourque, 1997) appears to fall 
under this aspect of item mapping. For example, to be selected as an exemplary 
item for the Proficient level of the NAEP 1996 mathematics achievement, the 
item must meet the following two rules (among other things). 

Rule 1 (on content): The content of the item must match the content 
of the operationalized description of the Proficient level 

Rule 2 (on probability): The probability of answering the item cor- 
rectly for a Proficient student must be greater than 51%. (Bourque, 

1997, p. 384) 

The first rule on content clearly implies the existence of a set of items appro- 
priate at the Proficient level. This pool is smaller than the total item pool and 
its size is expected to impact the cutoff probability set in the second rule. For 
example, if the Proficient item pool is large, one might impose a cutoff probabil- 
ity larger than 51%. On the other hand, if the Proficient pool is small, a smaller 
cutoff probability might be needed to select the exemplary items. It may be 
noted that the existence of an item pool at each ability location is not implied 
in the Type 2 item mapping. In this type of item mapping, an attempt is made 
to locate a subset of items at a given anchor point without any knowledge about 
the content of the selected items. 

The IRT literature is inundated with well-documented procedures for select- 
ing items for test administration or ability estimation (Type 1 of item mapping). 
Selection rules for exemplary items (Type 3 of item mapping), by and large, de- 
pend on the size of the item pool with content that is appropriate for a given 
anchor point. Thus it seems fair to state that Type 3 item mapping requires 
more than a statistical rule for item selection. 

Given the statistical nature of this paper, we will now focus only on selection 
rules for Type 2 item mappings. 

Assessing the Consequences of an Item Selection Rule 

Within the context of item response theory (IRT), mapping a binary item 
for CR interpretation is tantamount to replacing the item characteristic func- 
tion (icf) with a 0/1 step function. Although differing in context, this type of 
replacement is statistically identical to the process of mastery testing in which 
test scores (or abilities) are dichotomized into a pass or a failure. A general 
formulation of mastery testing may be found in Huynh (1976). 

Consider now a binary item with increasing (in the large sense) item charac- 
teristic function P{0). This function represents the probability that an examinee 
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with ability 6 will answer the item correctly. Mapping the item at the ability T\ 
for CR interpretation means that an examinee with ability 6 > r l is ‘‘expected” 
to answer the item correctly. In the context of mastery testing, the examinee 
is said to pass the item or is a master. On the other hand, an examinee with 
ability 6 < T\ is “expected” to answer the item incorrectly. In other word, the 
examinee is deemed as a failure on the item (a non-master) (Huynh, 1998, p. 
47). Thus, for CR interpretation, the icf P(6) is replaced by the step function 

s(0) - / 0 if e < r i 

() \ 1 if 6 >tl 

Note that the icf P(6 ) is increasing, so the conditions 9 > Tiand 6 < T\ are 
equivalent to the requirements P(6 ) > pi and P(9) < pi where pi = P(ti). 

At the ability 0, the probability of answering the item correctly P(9) is 
also the probability of a false negative error. (This is the error encountered 
when failing an examinee who answers the item correctly.) The probability of 
answering the item incorrectly Q(9) = 1— P(9) represents also the probability 
of a false positive error. (This error occurs when mastery status is granted to 
an examinee who answers the item incorrectly.) Now let C~ (8) and C+{0) be 
the cost (or loss) associated with a false negative and false positive error and let 
R(9) be the ratio C _ /C + . Within a traditional decision-theoretic framework, 
it is found (Huynh 1976, p. 67) that 

Pi = 1/[1 + R] 



or equivalently 



R= (1 -pi)/pi. 



Thus, the following relationship holds for the probability pi of answering the 
item correctly at the cutoff ability T\, 

Pi < .5 if R > 1 
Pi = .5 if R = I 
Pi > .5 if R < 1. 

The above formula provides a way to assess the consequences of selecting a 
value for T\ or p\. For example, if p\ = 50%, then it may be deduced that R = 1 
(e.g. the false positive and false negative errors are weighted equally). On the 
other hand, if pi = 2/3, then R = 2 (e.g. the false negative errors are twice as 
serious as the false positive errors.) 
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Empirical Bayes Type 2 Item Mappings: 

Case of Rasch and 2PL items 

Working within the context of the Bock partition of the item information 
(Bock, 1972) for Rasch items, Huynh (1994) arrived at the maximum informa- 
tion (MI) selection rule based on the value p x = 2/3. This cutoff probability 
also holds for 2PL items. Subsequently, Huynh (1998) developed a general psy- 
chometric theory for selecting 3PL items and the categories of polytomous items 
for CR and scale anchoring. For 3PL items, MI selection rules place an item 
at the ability place where the probability of answering the item correctly is 
Pl = (2 + c)/3. 

It may be noted that 



(2 + c)/3 = c+(l-c) x 2/3. 



Thus, the threshold probability of (2 -4- c )/ 3 for a 3PL item can be deduced 
from the cutoff probability of 2/3 for a 2PL item (an item without guessing) by 
using the formula for correction for random guessing. 

The remainder of this section provides another way to derive selection rules 
for Type 2 item mappings. The alternative process is based upon a construction 
of a 44 synthetic ” population of examinees for whom the item is appropriate. In 
the context of formal mathematical statistics, such a process is similar (if not 
identical) to the use of an empirical Bayes approach. 

To locate a given item on a scale for CR interpretation, the following two 
questions are posed. 

Question 1: To which population of examinees is the item most ap- 
propriate? 

Question 2: For the population found in Question 1 , what is the 
typical ability of those who answer the item correctly? 

A formal empirical Bayes solution of Type 2 item mappings within the con- 
text of these two questions is provided in Huynh (1998). Starting with a 2PL 
item and within the family of conjugate and symmetric priors (or ability dis- 
tributions), it is found that the answer to Question 1 is the ability distribution 
with a probability density proportional to [/(0)] a where 1(0) is the item infor- 
mation and a is any positive constant. The corresponding cutoff probability 
takes the general form 



Pi = (a + l)/(2a + 1) 



and the cutoff ability is given as 

_ b + log[(a + l)/a] 
Da 
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Since a is positive, it is clear from the above formula that, for Rasch and 2PL 
items, the cutoff probability p 1 is larger than 50%. The maximum information 
(MI) rule based on Pi = 2/3 satisfies the condition of an empirical Bayes rule. 



Empirical Bayes Type 2 Item Mappings: 

Case of 3PL Items 

Consider now a 3PL item with item information 1(0), As in the case of 
Rasch or 2PL items, we will consider the ability distribution with probability 
density proportional to [/($)] Q where a is any positive constant. As documented 
in Huynh (1998), the cutoff probability p\ will now take the general form: 

„ _ a + c+l + [(a + c+l) 2 ~ 4c(2a + 1)(1 — a)] 1 / 2 

Pl ~ 2(2a + 1) • 

As an illustration, let a = .5 and c = .25. This value for a corresponds to 
the family of prior (ability) distributions considered by Jeffreys (1938, 1949, 
1961) and the value c = .25 may be thought as coming from a multiple-choice 
item with four options. For the situation under study, the cutoff probability is 
p J = 79.65%. It may be interesting to note that earlier NAEP scale anchoring 
used a cutoff probability of 80%. 

Linking Empirical Bayes Rules for Rasch and 2PL Items to 
Rules for 3 PL Items Through Correction for Random Guessing 

For the ability distribution with a = .5 of above and with a 2PL item 
without guessing (c = 0), the cutoff probability is p\ = (a-h l)/(2a+ 1) = 75%. 
If this item had four options (with c = .25) and if the formula for correction 
for random guessing were used, then the cutoff probability would be = -25 -4- 
.75 x .75% = 81.25%. This value differs from the value p{ = 79.65% computed 
in the previous section. Thus, in general, the cutoff probability p\ computed 
directly from the general empirical Bayes approach for 3PL items differs from 
the cutoff probability p\ computed by applying the formula for correction for 
random guessing to a 2PL item. These two cutoff probabilities are identical only 
when a = 1. When this condition holds, the cutoff probabilities are pi =2/3 for 
Rasch and 2PL items andpj = (2+c)/3 for 3PL items. These cutoff probabilities 
are the ones associated with the maximum information (MI) selection rules 
derived by Huynh (1994, 1998) based on the Bock (1972) partition of the item 
information. 

Concluding Remarks 

Item mappings are widely used in educational assessment. Items are typi- 
cally mapped in different ways depending on the purpose of the mapping. This 
paper makes an attempt to categorize item mappings in three broad types with 
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intended use (l) to estimate the ability of an examinee, (2) to attach a criterion- 
referenced interpretation to test scores and to construct ordered item booklet 
for Bookmark standard settings, and (2) to select exemplary items for desig- 
nated anchor points on an achievement continuum. A process is provided for 
assessing the relative magnitude of the consequences of a given selection rule 
for item mapping. Rationales and details of an empirical Bayes approach to the 
construction of selection rules are provided for the family of Rasch, 2PL and 
3PL items. It is mentioned that the empirical Bayes rules are identical to the 
maximum information (MI) rules when the rules for 3PL items are linked to 
those of 2PL items though the formula for correction for random guessing. 
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